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ABSTRACT 



This paper revisited the findings of a study of school 
effectiveness changes by J. Gray and others (1995) and compared them to 
findings from two other studies, one by J. Freeman and C. Teddlie (1996) and 
the other conducted for this report. In the study by Gray and others, the 
researchers used data from three cohorts of secondary school students in 
Great Britain. The outcome measure was a national examination. Complete data 
were obtained for 7,829 students from 30 different schools. In the study by 
Freeman and Teddlie, the school effectiveness indicator was established by 
using a regression model using a composite student achievement score as a 
criterion variable and two predictor variables, student socioeconomic status 
and community type. Data were obtained for 634 students. In the current 
study, Scholastic Assessment Test scores from each school were used as 
indicators of school effectiveness. There were many differences among these 
studies, but some conclusions can be drawn from the results. The range of 
percentages for schools that change, as predicted by Gray and others, 
one-fifth to one-fourth, with roughly half improving and half declining, was 
similar to that found for the other two studies, strengthening the notion 
that in a given set of schools, it is predictable how many will be changing. 
Differences do suggest that the criteria and methods for establishing school 
effectiveness indicators will result in unlike results. All three studies 
suggest that less than 20% of schools will improve over time, and it would be 
rare that a school would move from the bottom quarter to the top over a 
3-year period. All three studies also suggest that 20% of schools decline 
over time. A close look at the school effectiveness indicators suggests that, 
while the majority of schools remain stable in effectiveness levels over 
time, this is not a linear process but one that had fluctuations over time. 
The study also suggests that the schools in Alabama, although involved in a 
statewide accountability program, were not improving at a greater rate than 
schools in the other studies that were not involved in an 

accountability/ improvement program. An appendix contains an excerpt from the 
Alabama Administrative Code. (Contains 63 references.) (SLD) 
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Introduction 

For the past ten years or so, the research areas of school effectiveness and school 
improvement have been engaged in a slow, but deliberate process of merging their 
divergent philosophies and methodologies (e.g., Freeman, 1997; Gray, Jesson, Goldstein, 
Hedger, & Rasbash, 1995; Gray, Hopkins, Reynolds, Wilcox, Farrell, & Jesson, 1999; 
Reynolds, Hopkins, & Stoll, 1993). Stability, or how constant the measures of school 
effectiveness are across different points in time (Crone, Lang, Franklin, & Halbrook, 
1994), is one aspect of school effectiveness research that is antithetical to school 
improvement research. In an attempt to explore school improvement from the 
perspective of school effectiveness, Gray et al. (1995) set about to answer the question, 
how much do schools change in terms of their effectiveness over a number of years? The 
results will be presented later in this paper; along with the results of two other recent 
studies to determine if change occurs consistently over time, or more appropriately, can 
we anticipate change? 

In this effort to merge the disciplines of school effectiveness research and school 
improvement research, a major effort must be made to reconcile the issue of stability vs. 
change. Gray et al. (1 995) had this in mind when they sought to determine the degree of 
change in effectiveness levels over time. They found that roughly two-thirds of the 
schools failed to change, while one-third changed (either improving or declining) in 
terms of effectiveness. 

Purpose of the Study 

The intent of this paper is to revisit the Gray et al. (1995) study, take the findings, 
and compare them to the results of school change in two other venues. One of the 
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suggestions from the original study was that this degree of change over time might be 
impacted by the method of measuring the levels of effectiveness. In order to examine 
this idea, the two other studies in this paper will be presented, despite the fact that both of 
these studies differ in terms of sample size, school configuration, and methodology. 
Regardless of these differences, the present study will partially replicate the Gray et al. 
(1995) study to determine the degree of change in effectiveness levels over time using 
these differences as a type of independent variable, although we are not seeking to obtain 
any statistical significance, or hypothesis-testing results. It is simply the purpose of this 
study to compare these results for differences in degree of change in effectiveness over 
time, as measured by two different indicators of effectiveness. 

Significance of the Study 

It is the intent of the present study to partially replicate the Gray et al. (1995) 
study seeking to measure the degree of change in effectiveness levels over time, using 
divergent designs. By doing this, we hope to build upon the previous study and begin to 
answer some of the questions raised by the researchers in that study. Therefore, it is our 
belief that the significance of the present study lies in the fact that it is another step 
toward the merger of school effectiveness and school improvement research disciplines. 
By measuring school improvement in terms of school effectiveness indicators, we hope 
to help alleviate the contradiction concerning stability vs. change in the field. 

Background Literature 

Beginning in the 1950s, and increasingly in the 1960s, federal funding of public 
education increased dramatically. With this increase in federal support, policymakers 
increased the emphasis on evaluating the product of education. The first major effort by 
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the federal government to evaluate public education came in the Coleman Report 
(Coleman, Campbell, Hobson, McPartland, Mood, Weinfeld, & York, 1966), the results 
of which were interpreted erroneously, suggested that public schools did not affect the 
educational success of students. Jencks and his colleagues (1972) followed these 
findings and added to the controversy by reporting that schools do not have as much 
impact on students’ cognitive achievement as family characteristics and social class. 
These findings were interpreted to mean that school reform in curriculum, instruction, 
and financial expenditures would not favorably affect the achievement of students as 
measured by objective tests (Jencks et al. 1972). 

The Coleman and Jencks Reports were troubling to educational policymakers who 
had a personal stake in governmental funding of public education. They feared that 
Congress might react to these findings by asking, “why spend billions of dollars on 
public education if schools do not make a measurable difference in the academic 
performance of its students?” In responding to the Coleman and Jencks Reports, a cadre 
of researchers harshly criticized their research methods. By correcting these 
methodological problems, they hoped to prove that schools could make a difference in a 
student’s academic performance (e.g., Edmonds, 1979; Reynolds, 1976; Weber, 1971). 
Four major studies were conducted in the United States and Great Britain that disputed 
the findings of Coleman and Jencks (and a similar report in Great Britain, known as the 
Plowden Report), by concluding that certain school effects did have a measurable impact 
on student achievement (Brookover, Beady, Flood, Schweitzer, & Wisenbaker, 1979; 
Mortimore, Sammons, Stoll, Lewis, & Ecob, 1988; Rutter, Maughn, Mortimore, & 
Ousten, with Smith, 1979; Teddlie, Falkowski, Stringfield, Desselle, & Garvue, 1984). 
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These "responsive" studies laid the foundation in the late 1970s and early 1980s for what 
became known as school effectiveness research. 

School Effectiveness Research 

The emphasis of early school effectiveness research was to identify the 
characteristics of an effective school. Edmonds (1979) summarized these characteristics 
as: (a) strong educational leadership; (b) high expectations for student achievement; (c) 
an emphasis on basic skills; (d) a safe and orderly climate; and (e) frequent evaluations of 
pupils’ progress. This emphasis on listing effective school characteristics caused the 
simultaneous development of school improvement research. Early school improvement 
research sought to use the characteristics of effective schools, such as the Five-Factor 
Model of Edmonds, by transplanting them to ineffective schools. The results were mixed 
due in some cases to a lack of consideration of what Elmore (1978) and McLaughlin 
(1978) called “mutual adaptation.” Mutual adaptation occurs when the interaction 
between a given environment and a plan to produce change in that environment results in 
a program outcome that differs from the one originally intended (Purkey & Smith, 1 983). 

Since most of the early school effectiveness research was carried out in low-SES, 
inner-city schools, Edmonds called for the creation of ‘effective schools for the urban 
poor’ in those specific environments (Edmonds, 1979). It was at this point that school 
improvement studies (e.g., Clark & McCarthy, 1983; McCormack-Larkin, 1985; 
McCormack-Larkin & Kritek, 1982; Taylor, 1990) began to flourish and surpass school 
effectiveness research in the U.S. However, the equity orientation in school effectiveness 
research, with its emphasis on school improvement and its obvious sampling biases led to 
criticism from the traditional educational scientific community (e.g., Cuban, 1983, 1984; 
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Firestone & Herriott, 1982; Good & Brophy, 1986; Purkey & Smith, 1983; Rowan, 1984; 
Rowan, Bossert, & Dwyer, 1983). This criticism resulted in a figurative wedge being 
driven between the school effectiveness and school improvement areas of research in the 
U.S. (Teddlie, 1994). 

By the late 1980s a shift toward an “efficiency model” of school effectiveness 
began to develop centering on the issue of context (Wimpelberg, Teddlie, & Stringfield, 
1989). With the emergence of the efficiency model, school effects began to be examined 
across a variety of context variables, such as the socioeconomic status (SES) of the 
students attending schools, grade levels of schools, and urbanicity of schools (Teddlie, 
1994). Other context differences studied include the following: size of school and 
district, school governance sector (public/private), subject matter context, and 
administrative context (voluntary/ nonvoluntary orientation toward change) (Levine & 
Lezotte, 1990; Wimpelberg et al. 1989). 

School effectiveness research within the efficiency model has shown that schools 
are social systems and are bound together by a complex array of contextual variables 
(Teddlie & Stringfield, 1993). What works in one school may not work in another school 
and unless school improvement or reform efforts allow for this fact, attempts at 
transplanting standardized school improvement plans have very little chance of success. 
Despite these findings concerning contextual factors in schools, government officials at 
the state and national levels began to insist that the characteristics of effective schools be 
incorporated into school improvement programs. By 1989, the U.S. General Accounting 
Office (GAO) issued a report that showed that more than 50% of American schools had 
undertaken school improvement efforts (GAO, 1989), which were primarily based on the 




7 



The Impact of Unlike Indicators 7 



Five-Factor Model (Edmonds, 1979). This is a direct result of the Hawkins-Stafford 
Elementary and Secondary School Improvement Amendments of 1 988 (Public Law 1 00- 
297), which specifically mandated that the characteristics identified in the Five-Factor 
Model must be stressed in any improvement programs funded with Chapter 1 and 2 
monies (Teddlie & Stringfield, 1993). 

With a general public perception that schools are not adequately preparing 
students for life, educational reform has developed into a major “industry,” with new 
ideas arising from many areas. Restructuring, site-based management, outcomes-based 
education, magnet schools, redesign, total quality management, charter schools, 
vouchers, etc., have all been touted as the reform that will improve education in America. 
In Education Week, a national newspaper dedicated to educational issues, a series of 
articles detailed efforts to reform education. Included in that issue was a compilation of 
36 organizations, foundations, and companies that promote, and often sell, school 
improvement (School reform networks at a glance, Nov. 2, 1994, pp. 34-41). 

School Improvement Research 

Reynolds and his colleagues (1993) found that most of the commonalties between 
school effectiveness and school improvement research are found in practice and not 
theory. While over half of U.S. schools have introduced some form of school 
improvement based on some aspect of school effectiveness research (General Accounting 
Office, 1989; Taylor, 1990), most schools base their practices on the “five factor” models 
of Edmonds (1979) and Lezotte (1989), rather than the studies of Teddlie and Stringfield 
(1993) and Mortimore et al. (1988), which have demonstrated that a one-size-fits-all 
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approach to school improvement will usually not be successful because each school has 
different needs in different contexts. 

Thousands of school improvement programs have been aimed at improving the 
schools, the teachers, and the students, and have been attempted by the federal, state, and 
local governments as well as private interests in the U.S. With so many agencies 
attempting so many approaches to school improvement, it is easy to understand why a 
complete survey of the literature would be so difficult. To expedite this venture, an 
outline by Sashkin and Egermeier (1992) will be utilized to identify three broad 
perspectives on school improvement in the U.S. They based their work on research of 
Chin and Benne (1969) and House (1981). The following describes the three 
perspectives. 

The rational-scientific perspective dominated attempts to improve schools from 
the 1950s to the 1970s. This perspective assumed that if people were given the necessary 
information to improve schools, that they would use that information. 

The political perspective was best described by “strong external policy controls 
derived through processes of bargaining and political compromise among power groups,” 
and was found in many autocratic, state level reform initiatives of the early 1980s 
(Sashkin & Egermeier, 1992, p. 2). Four instruments used by states to effect change were 
mandates, inducements, capacity building, and system changing (McDonnell & Elmore, 
1987). 

A change in meanings and values within the organization that is undergoing 
change describes the cultural perspective . These cultural changes result in a 
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“transformation” of the organization and are necessary because the status quo is 
preventing the school from improving (Moorman & Egermeier, 1992). 

Sashkin and Egermeier (1992) posit that four operational strategies are used to 
implement one or more of these perspectives. These strategies are fix the parts, fix the 
people, fix the school, or fix the system. 

Fix the parts implies that some part of the educational process is defective and can 
be identified and replaced with an innovation that will produce better results. This 
strategy is primarily based on the rational-scientific perspective. Many projects have 
been undertaken, particularly those that are federally funded, to study the processes by 
which school personnel receive information pertaining to new programs and how they 
adopt programs and practices that effect improvement. 

Many studies reflected positively on those attempts to improve, while many 
studies showed that those efforts to improve made little or no difference, and that often 
innovations were adapted or changed, or improvements disappeared when the money ran 
out (McLaughlin, 1990). In summary, the “fix the parts” strategy has proven that even if 
an innovation is successfully transferred into schools, improvement may not be the result 
(Sashkin & Egermeier, 1992). 

Fix the people is a strategy that is based on the idea that knowledge and skill 
improvement of teachers and administrators will allow them to better perform their roles 
and consequently better effect improvement in the schools (Sashkin & Egermeier, 1992). 
This strategy incorporates the rational-scientific perspective, but also incorporates the 
cultural perspective. 
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While most of the research in this area centered on developing the staff, rather 
than determining whether the school improved as a result of the “developed” staff 
(Freeman, 1997), other researchers took different approaches to the analysis of staff 
development. Fullan (1990) sought to link staff development to institutional 
development and identified several approaches to staff development. Levine and Lezotte 
(1990) concluded that practice-oriented staff development was more effective than “one 
shot” inservice training programs. And, Stedman (1987) found several common elements 
in staff development at unusually effective schools. 

The fix the school strategy centered on developing the ability of the 
organizations’ capacities to solve their problems. This concept arose from the field of 
practice known as “organizational development,” or OD. With OD efforts are aimed at 
assisting members of organizations recognize problems confronting the whole 
organization rather than dealing with problems that affect parts of the organization 
(Sashkin & Egermeier, 1992). 

Fullan, Miles, and Taylor (1981) reviewed OD practices in schools and 
recommended that this approach should only be used when a school or district meets 
certain “readiness criteria.” A variety of OD-based school improvement models have 
been implemented since that review, some of which have been successful (Freeman, 
1997). 

The last strategy, fix the system, focuses on restructuring or comprehensive 
school change. Comprehensive restructuring encompasses the first three strategies and 
includes the community, the school district, state education agencies, professional 
development institutions, and also federal agencies (Sashkin & Egermeier, 1992). The 
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term restructuring might be a poor choice to describe this strategy because the word 
means so many different things to different people. However, research doesn’t give a 
clear indication of the effectiveness of this approach (McDonnell, 1990). 

There are at least two components that seem to appear in the restructuring 
literature. First, restructuring means that a coherent system exists to push authority down 
to the lowest level (Bailey, 1992). Second, restructuring involves a basic change in 
accountability, and this in turn relates to a set of changes in the “governance” of schools 
(Murphy, 1990). 

Merging the Two Disciplines 

The reviews of the school effectiveness research literature and the school 
improvement research literature indicate that the two fields have developed from 
different places both methodologically and theoretically (Gray, Reynolds, Fitz-Gibbon, & 
Jesson, 1996). Table 1 provides a generalization of the contrasts between school 
effectiveness and school improvement, as proposed by Reynolds et al. (1993). 

Despite the differences between the two fields, recently researchers from both 
disciplines have called for a synthesis of school effectiveness and school improvement 
research. For example, Mortimore (1991) called for transferring the “energy, knowledge, 
and skills of school effectiveness research to the study of school improvement” (p. 223). 
Stoll and Fink (1992) stated, “it is only when school effectiveness research is merged 
with what is known about school improvement, planned change, and staff development, 
that schools and teachers can be empowered and supported in their growth toward 
effectiveness” (p. 104). In addition, Murphy (1992) has called for change that will 
realize the potential of conventional school improvement and also the more radical 
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Table 1 

The Separate Traditions of School Effectiveness and School Improvement 
(Reynolds et al., 1993, p. 44) 



SCHOOL EFFECTIVENESS 
Focus on schools 

Focus on school organization 

Data driven, with emphasis on 
outcomes 

Quantitative in orientation 

Lack of knowledge about how to 
implement change strategies 

More concerned with change in pupil 
outcomes 



More concerned with schools at a point 
in time 

Based on research knowledge 



SCHOOL IMPROVEMENT 
IN THE 1980s 

Focus on individual teachers or groups 
of teachers 

Focus on school processes 

Rare empirical evaluation of effects of 
changes 

Qualitative in orientation 

Concerned with change in schools 
exclusively 

More concerned with the journey of 
school improvement than its 
destination 

More concern with schools as 
changing 

Focus on practitioner knowledge 



restructuring of the entire educational system, including its power relations, and the 
teaching-learning processes in schools. Furthermore, the international journal, School 
Effectiveness and School Improvement , in its mission statement argued for “empirical 
rationality” in assessing the validity of models in both school effectiveness and school 
improvement (Creemers & Reynolds, 1990). 
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With this increased emphasis on attempting to combine school effectiveness and 
school improvement research, Reynolds et al. (1993) have developed a series of 
suggestions that would facilitate this merger. 

1 . Develop more case studies in school effectiveness research so that the 
transfer of knowledge to the school improvement community (with its 
emphasis on qualitative data) will be more relevant. 

2. School effectiveness research should put more emphasis on process 
factors such as attitudes, values, relationships, and climate, which are 
needed by school improvement research. 

3. School effectiveness research tends to take “snapshots” of schools rather 
than taking moving pictures of schools over time. School improvement 
research needs to know how schools became effective or ineffective to 
know how to replicate the process. 

4. More emphasis should be placed on studying the variable of principal 
leadership outside the U.S. 

5. Most school effectiveness research neglected the potential impact of 
other layers above the school level. There is evidence in school 
improvement research that these other layers may be crucial to generating 
improvement. 

6. School effectiveness research should attempt to isolate direction and 
strength of the influences that link school process variables together. 

7. School effectiveness research should attempt to determine which process 
variables are causes of school effectiveness. For example, does high 
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teacher expectation cause improved student performance or does high 
student performance cause higher teacher expectations. 

8. Dated school effectiveness research from the 1980s may not be sufficient 
to address school improvement schemes of the 1990s. Therefore, it is 
important to make sure that the factors that identified effectiveness in the 
past are still relevant today. 

9. Context research in school effectiveness has only been utilized a short 
time. At the present stage, the results are not specific enough to assist 
school improvement research in determining what will work indifferent 
schools. 

10. The knowledge required of improvers of ineffective schools is not found 
in school effectiveness research. Assuming that what works in an 
effective school will work in an ineffective school is not sufficient. 

1 1 . School improvement research needs to address the impact of innovations 
upon student performance or outcomes. Without these data 
understanding the causal relationships between school processes and 
outcomes is impossible. 

12. School improvement strategies need to move away from whole-school 
programs, bases on evidence from school effectiveness research that 
indicates that schools can have differential effects on students (Nuttall, 
Goldstein, Prosser, & Rasbash, 1989). School improvement programs 
should vary within the school in terms of their content, their focus, and 
their targeted population. 
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13. School improvement researchers need to concentrate on why changes 
occurred more than on how much change occurred. 

14. School improvement researchers need to address the class level and the 
school level. Many school improvement programs disregard the nature 
of instructional practices altogether. 

With this list of criteria for merging the two disciplines firmly planted in the 
literature today, various researchers have begun the task of meeting these criteria. Gray 
et al. (1995) has made one such attempt, where they sought to explore an aspect of school 
improvement from the perspective of school effectiveness. They attempted to answer the 
question, how much do schools change in terms of their effectiveness over a number of 
years? The answer to this question can facilitate a better framework for studying the 
mechanisms or processes of school improvement. 

The reason that there are so few of these studies is the contrary positions that the 
two disciplines have regarding stability of effectiveness indicators. School effectiveness 
research has sought to strengthen their findings by initiating multiple years of data 
collection for the purpose of replication. If a lack of stability in those indicators is 
detected, the validity of the school effectiveness study is threatened. On the other hand, 
instability is necessary for school improvement research to take place. Improvement can 
only occur where change is present. The only way to approach this situation, then, is to 
develop an orientation toward change. 

Since much of the early school effectiveness research indicates a high level of 
stability over time, the degree of change was not readily apparent. So, Gray et al. (1995) 
set about to determine how much schools change over time in regard to their school 
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effectiveness indicators. In that study, Gray and his colleagues (1995) found that when 
looking at a school’s level of effectiveness in Year 1 , and then again in Year 3 (see the 
methodology section for a detailed description of the methods employed in this study) 
around 68% of the schools were similarly categorized (see Table 2). In other words, in 
roughly two of three schools, there was no change in effectiveness levels over the three- 
year period. Likewise, roughly one third of the schools experienced a measurable change 
in effectiveness levels, that is, they either improved over time, or they declined in their 
level of effectiveness. The results indicate that roughly half (1 8% of the schools) of the 
changers improved and half (15% of the schools) declined in their effectiveness level. 

Of course, Gray et al. (1995) were aware that the framework for reporting these 
changes in effectiveness levels was what they called “rule-of-thumb.” However, when 
taking a more detailed look at individual schools in the study, only one of the 30 schools 
had improved consistently over the three-year period. Since only three years of data were 
used in the study, the results indicate that these were linear trends. The researchers 
cautioned against assuming non-linearity from just three years, but by the same token, 
non-linearity could not be ruled out. 

Other findings from the Gray et al. (1995) study related to the issues of change in 
effectiveness levels over time are as follows: 

1 . Based on the evidence presented in the study, there appears to be only a small 
portion of schools in any area that will change (improving or declining) 
significantly. They noted that this number should be between one- fifth and 
one-fourth of the schools (which the present paper seeks to replicate, using 
other methods for measuring effectiveness levels). How much of this change 
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can be attributed to actual improvement in the school and how much is 
attributable to changes in pupil population (higher achieving students), is not 
answered by this study, but is a question that should be answered in further 
studies. 

2. Changes in effectiveness levels are likely to appear modest, so researchers 
need to be aware of the sizes of changes that occur, and the potential 
significance in educational terms. For instance, consistent improvement over 
five years could turn an ineffective school into a relatively effective one. 

3. With only three years’ data the researchers refrained from speaking of trends. 
Each additional year’s results will bring this to light more clearly. Also, the 
data were presented as a linear process, when in fact other studies (Gray et al., 
1999) have indicated that change may be inconsistent over time. Fast 
improvement in the first years of an improvement project may slow down in 
later years, or reverse itself. It is important to understand that each school has 
its own unique “natural history” of change. 

4. The extent to which changes in effectiveness are dependent on the outcome 
measures used needs to be examined. (This paper seeks to address this point 
by noting the levels of change in two other venues that use differing measures 
for effectiveness levels.) The fear is that aggregate measures may mask 
significant improvement in a particular subject area of the school, while 
showing the school, overall declined in effectiveness. The researchers also 
remind us that improvements in exam scores may not go hand-in-hand with 
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other kinds of change (see the discussion in this paper about high-stakes 
testing, as detailed by the American Educational Research Association). 

5. The last thing detailed by Gray et al. (1995) was that we need to begin 
investigating the causes of change in particular schools, relative to other 
schools’ levels of effectiveness. The correlates of improvement (changes in 
effectiveness) must be employed, rather than the correlates of effectiveness 
that dominate the school effectiveness literature. They referred back to the 
classic work of Purkey and Smith (1983) that asked such questions as “Were 
different strategies needed for low-achieving schools to raise their level of 
effectiveness, and for high-achieving schools that were beginning to decline in 
effectiveness?” And “what is needed to maintain a school’s success once it is 
deemed to be academically effective?” In school improvement research, sites 
that are deemed to be ineffective are just as interesting as sites of highly 
effective schools. 

Methodology 

The methodology used to provide a comparison of school effectiveness levels 
required the establishment of an acceptable school effectiveness indicator in each study. 
Although school effectiveness research has developed an extensive literature base over 
the past 25 years, the main problem for researchers has remained the lack of a universally 
accepted method of classifying schools based on the criterion of effectiveness (Good & 
Brophy, 1986; Levine & Lezotte, 1990; Purkey & Smith, 1983; Rowan et al., 1983). The 
most widely utilized technique, the regression model resulting in school effectiveness 
indicators (SEI) is based on residual scores, has shown problems in terms of stability of 
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effectiveness measures over time (Mandeville & Anderson, 1987; Purkey & Smith, 1983; 
Rowan et al., 1983). However, for more than 20 years, since Dyer, Linn, and Patton 
(1969) attempted to control for student context and demographic variables, the regression 
model has been the most frequently used technique for establishing SEIs (Lang, 1991; 
Mandeville & Heidari, 1988). 

In the U.K., these regression-based SEIs are known as “value-added” scores (Fitz- 
Gibbon, 1996). While some advocate the use of more advanced multilevel models for the 
generation of SEIs, such as utilized by Gray et al. (1995), research shows that multilevel 
models (focusing on the school level) and regression models (with the school as the unit 
of analysis) yield similar statistics (Kennedy, Teddlie, & Stringfield, 1993; Fitz-Gibbon, 
1996). 

Of some concern in the present study is the issue of consistency of school 
effectiveness indicators. When the model used as a school effectiveness indicator 
changes, often the classification of the school changes as well. For example, if the 
student achievement measurement used as the criterion variable in a regression model is 
based on reading scores, the school effectiveness classification may be different from the 
classification based on mathematics scores (Witte & Walsh, 1990). Therefore, Purkey 
and Smith (1983) felt that using only one subject area or grade level as the measure of 
student achievement gave a very limited view of a school's effectiveness. Mandeville 
and Anderson (1987) reported finding no “appreciably higher” consistency of scores with 
a combined reading-mathematics score, but stated that a composite should provide 
increased reliability. Crone, Lang, Teddlie, & Franklin, (1995), using a combined 
language arts/mathematics score as the criterion variable in the regression model, found 
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agreement between quantitative and qualitative results, further supporting the idea that a 
combination score provided for higher reliability in measuring effectiveness. 

The main thrust of this study was to replicate a procedure conducted by Gray and 
his colleagues (1995) and Freeman and Teddlie (1996) to study changes in schools’ 
effectiveness over time. They grouped schools’ residuals into the top quarter, middle half, 
and bottom quarter (see Table 2). Regarding the charts, Gray et al. (1995) put their 
information “in a form that can be readily grasped by the non-statistical reader” (p. 108). 
That statement seemed very applicable for this study also. 

The methodologies employed by the three studies used for comparison in this 
paper will be presented first, followed by a more specific description of how the 
comparisons are made. 

Gray et al. (1995) 

In this study, the researchers used data from three cohorts of secondary students in 
Great Britain. The student outcome measure was based on a national exam (GCSE) 
taken by all students aged 16 and above in a variety of different subjects. The students 
attended 30 different schools of varying organizational and governance types. Complete 
data were obtained on 7829 pupils, including test scores, prior attainment, pupils’ gender, 
and school contextual data. A series of multi-level analyses utilizing a linear model that 
included the listed variables was developed. This analysis provided estimates of school- 
level residuals over the three-year period. These residuals were grouped into the top 
quarter, the middle half, and the bottom half for Years 1 and 3. The result is presented in 
Table 2. 
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Freeman and Teddlie (1996) 

In this study, the school effectiveness indicator was established by using a 
regression model using a composite student achievement score (SIP Scores), as the 
criterion variable, and two predictor variables (Student SES and community type). The 
result of the regression model was a set of actual and predicted scores for each school. 

The difference between these two scores, the residual, was assigned and used to compare 
schools. The schools examined in this study numbered 634. Similarly to the Gray et al. 
(1995), the schools, based on their school effectiveness level were grouped into top 
quarter, middle half, and bottom quarter in Years 1 and 3. The results are presented in 
Table 2. 

Present Study 

Alabama does not have a composite score across grade level and subject area, so 
SAT average scores were used from each school as effectiveness indicators for this study. 
The state also uses these scores to evaluate each school and classifies each school as 
being academic clear, academic caution, or academic alert, based on a score derived from 
the SATs. The definition of each of these terms and how they are determined are 
included later in the paper. 

The average SAT scores for all elementary schools, as reported by the Alabama 
State Department of Education at their web site, were tabulated and examined to 
determine which schools had improved, remained stable, or had declined from Year 1 to 
Year 3. Additionally, average SAT scores were used for comparison because there seems 
to be some agreement among researchers that increased reliability of a school’s 
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effectiveness will result from using multiple scores (Purkey & Smith, 1983; Mandeville 
& Anderson, 1987). 

All elementary schools that had an SAT score for 1 997 and 1 999 were imported 
into SPSS and divided into top quarter, middle half, and bottom quarter, for both years, in 
a similar way that the other two studies had done. Scores from 593 elementary schools 
were used in this study. 

A matrix was constructed similar to Gray et al. (1995) and Freeman and Teddlie 
(1996) and was used to list each school’s ranking for Year 1 and its ranking for Year 3 
(see Table 2). Cells 2, 3, and 6 identified those schools that improved, while cells 4, 7, 
and 8 identified those schools that declined in their scores. Cells 1, 5, and 9 identified 
those schools that were stable. Schools that had scores in the middle half in Year 1 and 
again in Year 3 were considered stable even though their average SAT scores might have 
increased or decreased. 

Since Alabama is in the midst of an educational reform effort to force schools to 
improve or face various negative sanctions, it seemed to be an opportune time to examine 
the specific indicators of effectiveness that are being used by the Alabama State 
Department of Education. By using a similar analysis process to that of Gray et al. 
(1995), and Freeman and Teddlie (1996), we can determine the degree of change in 
Alabama schools. This will give the percentage of schools that are improving, declining, 
and remaining stable based on the SAT average scores. 

Since this study examines change that is assessed by “high-stakes” testing, it is 
appropriate to consider some of the problems associated with this type of assessment and 
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the accountability policies that result. The Alabama accountability program, with its 
definitions and policies, is included in Appendix A. 

Accountability 

Reform efforts seem to be a constant in educational circles, and in order to have 
reform, evaluation of practices and products must occur. The underlying assumption of 
that premise is that the product of educational practices can be quantified and assessed 
accurately. 

In response to public and policy demands for accountability, numerous efforts 
have been directed toward identifying desirable “effects” and then finding something to 
cause those effects. Increasingly, then, we see that standards and performance indicators 
are turned into restrictive evaluation measures that often have no regard for the diverse 
contextual realities in which our schools exist. Accountability in these circumstances 
assumes that one set of standards can be equitably applied to everyone, regardless of 
context. 

Einstein is credited with saying, “No problem can be solved from the; same 
consciousness that created it.” If that statement is true, then educators must develop new 
ways of viewing and understanding our problems. Our old models assume that schools 
function within stable environments in which our students can be evaluated by the same 
set of prescriptive standards. 

Linn (2000) suggests several reasons for the great appeal of assessment through 
high stakes testing and the accountability policies that often result. Those reasons are: 
testing and assessment are cheap, can be externally mandated, can be rapidly 
implemented, and results are visible. He also contends that research supports the notion 




24 



The Impact of Unlike Indicators 24 



that test scores will increase in the first few years of a program with or without 
improvement in broader constructs. 

AERA, in its position statement on high stakes testing (2000), points out that 
many policymakers support high-stakes testing with the intention of improving 
education. These supporters of high-stakes testing hope that setting higher standards will 
inspire greater effort by everyone involved in the educational process. The policy 
statement also points out that with these tests there is potential for serious harm. It is 
easy to understand how high test scores, rather than learning become the overriding goal 
of classroom instruction. 

As we examine Alabama’s SAT scores, it is perhaps fitting to note that, like many 
other states that use this norm referenced test, some of the material that is tested is not 
included in the mandated state curriculum. Hence, teachers must take classroom time to 
cover non-required material to be tested by the SAT, because administrators, parents, 
students, community leaders, and perhaps the teachers themselves, are all going to judge 
that teacher’s ability to teach based on that score. 

Results 

Freeman and Teddlie (1996) noted “it is the fact that there were so many 
dissimilarities between the studies that makes the similarities in results ...interesting” 

(p. 18). Even though this study is somewhat similar to Freeman and Teddlie’ s study, and 
very different from the Gray study, the results are interestingly similar. To summarize 
the three studies, there are differences in sample sizes, basis for SEI, statistical analyses, 
and school configuration. 
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Table 2 compiles the resulting change from the three studies. In each box, the top 
score is from the Gray et al. (1995) study; the middle is the Freeman and Teddlie (1996) 
score, and is indicated by italics; and the bottom score is from the present study and is 
indicated by bold numbers. Cells 1 , 5, and 9 indicate the percentages of schools that did 
not change in their levels of effectiveness over time. These schools are considered to be 
stable. Cells 4, 7, and 8 represent the schools that improved in their levels of 
effectiveness, while cells 2, 3, and 6 represent the schools that declined. 

By examining the stable school percentages as represented in cells 1, 5, and 9, we 
find that Gray et al. (1995) found that 68% of the schools did not change. Freeman and 
Teddlie (1996) found around 64% of the schools in that study had failed to change, but 
the present study found that 77% of the schools had not changed in their levels of 
effectiveness. These findings reveal that the predictions made by Gray et al. (1995), that 
in any set of schools only one-fifth to one-fourth of the schools will have changed in their 
levels of effectiveness over time, were true for this study also. All three of the studies 
had results that fell roughly within the predicted range. 

It is interesting that the number of schools that remained stable in the present 
study was 13% more than in the Freeman and Teddlie (1996) study. While it is difficult 
to say whether this difference is significant, it does raise the issue of consistency of SEIs. 
Since the present study results were based on averages of raw test data, it raises the 
question of whether this SEI is more or less valid than regression-based SEIs. The 
literature tells us that the regression model is an acceptable method of establishing SEIs. 
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Table 2 

Comparative changes among schools in three studies 



Score from 
year one 


Score from year three 




Top Quarter Middle Half 


Bottom Quarter 




Gray et al. (1995) 






Top Quarter 


15% 


6% 


0% 


Middle Quarter 


9% 


35% 


9% 


Bottom Quarter 


0% 


9% 


18% 




Freeman and Teddlie (1996) 




Top Quarter 


15.62% 


8.04% 


1.42% 


Middle Quarter 


8.83% 


32.81% 


8.20% 


Bottom Quarter 


0.63% 


9.15% 


15.30% 




Present Study 






Top Quarter 


19% 


7% 


<1% 


Middle Quarter 


6% 


38% 


5% 


Bottom Quarter 


<1% 


6% 


20% 



These results may reveal that using average raw scores as SEIs is less valid and should be 
avoided in classifying schools in high-stakes accountability programs, such as found in 
the state of Alabama. 
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Cells 4, 7, and 8 indicate the percentage of schools that improved in their levels of 
effectiveness over time. Gray et al. (1995) found that 18% of the schools in their study 
had improved, while Freeman and Teddlie (1996) found 18.6% had improved. The 
present study found that roughly 12% of the schools had improved. The first two studies 
were almost identical in the percentage of schools that improved, while the present study 
had 6% fewer schools that improved. 

Cells 2, 3, and 6 indicate the percentage of schools that had declined in 
effectiveness levels over time. Gray et al. (1995) found that 15% of their schools 
declined, while Freeman and Teddlie (1996) found roughly 17% had declined, and the 
present study had 13% of the schools that declined in effectiveness levels over time. 

In terms of movement or change in effectiveness status, the results again are very 
similar, with the present study indicating fewer schools were improving than in the other 
two studies. Again, this may be attributable to the fact that the SEI used in the present 
study may be less valid than the other two studies. Of course, the minor differences in 
the change in effectiveness levels over time may be more attributable to other variables 
besides any specific school improvement efforts in the individual schools. 

Conclusions 

The present study sought to partially replicate earlier studies that explored the 
degree of change in school effectiveness levels over time. The following conclusions can 
be made from the results. 

1) The range of percentages for schools that change, as predicted by Gray et al. 
(1995), one-fifth to one-fourth, with roughly half improving and half 
declining, are similarly found in the other two studies. This strengthens the 
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notion that given a set of schools, it is predictable how many schools will be 
changing. 

2) The differences found in the present study, while not necessarily significant, 
may indicate that the criteria and methods for establishing SEIs will result in 
unlike results. Further analysis of Alabama’s test scores, where the same data 
are analyzed by different methods will bring about a clearer picture in this 
area. 

3) All three studies reveal that less than 20% of schools improve over time, and 
it is very rare that a school would move from the bottom quarter to the top 
quarter, over a three-year period. This indicates that in reference to school 
effectiveness research, ineffective schools will take at a minimum, more than 
three years to become effective. Therefore, any school improvement project 
should not expect to see immediate results. Unfortunately, many projects are 
disposed after the first year or two, if the school shows no sign of 
improvement. 

4) All three studies reveal that less than 20% of schools decline over time, and it 
is rare that a school would move from the top quarter to the bottom quarter, 
over a three-year period. This indicates that schools decline at roughly the 
same rate as schools improve. 

5) While roughly 65% to 75% of the schools in all three studies remained stable 
in their effectiveness levels over time, a closer look at SEIs indicate that this is 
not a linear process. Some schools improved the second year and declined by 
the third year. This means that the effects of school improvement efforts may 
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not be consistent. It may indicate that school improvement efforts may have 
to adjust as a school makes its way up the effectiveness scale. If adjustments 
are not made, a school may begin to decline. 

6) The present study indicates that the schools in Alabama, although involved in 
a statewide accountability program involving a high-stakes school 
improvement mandate, are not improving at any greater rate than the other 
studies that were not involved in such a program. This lends more credence to 
the notion that regarding school improvement efforts, one size does not fit all. 
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Appendix A 

From the Alabama mimmisirarive Code, 
Education Accountability, Chapter 290-4-1 



Education Accountability . 

(1) The State Superintendent of Education is authorized to carry out the review, 
examination and supervisory responsibilities as prescribed in the Code of Ala. 1975 , and to 
require reasonable and appropriate reports and to conduct hearings for the purpose of ensuring 
that due process requirements are met. 

(2) Academic Assistance Program. 

(a) Academic Alert - Local Schools. Local superintendents and local boards shall 
commit the resources necessary to improve the instructional program for schools on Academic 
Alert and shall budget to those schools all funds earned by the schools in the cost calculations of 
the foundation program. 

1. Phase 1 (Self-study). Following the spring 1996 administration of a nationally 
normed achievement test and thereafter, the State Department of Education (SDE) will identify 
every school in Alabama with a majority of its students scoring in Stanines 1, 2, and 3 and notify 
said schools that they are being placed on Academic Alert. Schools placed on Academic Alert 
shall engage in a self-study to examine the reasons for low student achievement and shall develop 
a school plan for improvement. The SDE will assist local schools on Academic Alert in 
developing improvement plans and will also offer staff development. 

2. Phase 2 (Outside Academic Improvement Teams). Following the spring 1997 
administration of a nationally normed achievement test and thereafter, the SDE will identify the 
schools on Academic Alert from the previous year that have shown insufficient improvement in 
student achievement and place an Academic Improvement Team in each affected school. These 
teams of practicing professionals from outside the school shall visit each school on Academic 
Alert; conduct a study for improvement; consult with faculty, staff, and the community; analyze 
causes of poor student achievement; and make specific recommendations for improvement of 
student academic performance. 

3. Phase 3 (Intervention). Following the spring 1998 administration of a nationally 
normed achievement test and thereafter, the SDE will identify the schools on Academic Alert 
from the previous two years that have shown insufficient improvement, and the State 
Superintendent of Education will appoint a person or persons from outside the school to run the 
day-to-day operations of the school. In considering intervention, the State Superintendent shall 
include factors such as dropout rates, attendance rates, special education enrollment, and other 
data necessary to properly interpret student achievement in each school. 

(b) Academic Alert - Local School Systems . Following the spring 1996 

administration of a nationally normed achievement test and thereafter, the SDE will identify 
every school system in Alabama with either a majority of its schools scoring Academic Alert or a 
majority of the students within a school system in which the students are scoring in Stanines 1, 2, 
and 3 and notify said school systems that they are being placed on Academic Alert. School 
systems placed on Academic Alert will follow the same procedures and be subject to the same 
accountability measures as identified in paragrap nd (a) 1., 2., and 3. for individual 

schools on Academic Alert. 

(c) Academic Caution - Local Schools and Local School Systems . 

1. Following the spring 1 996 administration of a nationally normed achievement 
test and thereafter, the SDE will identify every school in Alabama that has not been placed on 
Academic Alert but has a majority of its students scoring in Stanines 1, 2, 3, and 4 and every 
school system in Alabama that has not been placed on Academic Alert but has either a majority 
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of its schools scoring Academic Alert or Academic Caution or a majority of the students within a 
school system in which the students are scoring in Stanines 1, 2, 3, and 4 and notify said schools 
and school systems that they are being identified as an Academic Caution school or school 
system. Schools and school systems classified in Academic Caution must show annual 
improvement on a nationally normed achievement test. If insufficient improvement is 
demonstrated between the first and second administration and thereafter, the school and/or system 
shall be governed by the sanctions for schools and systems on Academic Alert. The SDE will 
offer services to these schools and school systems regarding staff development and academic 
improvement. 

2. The State Superintendent of Education shall have the authority to investigate the 
progress of schools and/or school systems within the category of Academic Caution which have 
demonstrated insufficient improvement and make a determination regarding placement in 
Academic Alert or Academic Caution. 

(d) Academic Clear - Local Schools and Local School Systems . Followingthe 
spring 1996 administration of a nationally normed achievement test and thereafter, the SDE will 
identify every school in Alabama with a majority of its students scoring in Stanines 5-9 and notify 
said schools that they are being identified as an Academic Clear school. Each school system with 
a majority of the system’s students scoring in Stanines 5-9 shall be declared an Academic Clear 
system. The SDE will offer services to these schools and school systems regarding staff 
development and academic improvement. 

(e) All references to achievement/improvement relative to scores and status on 
norm-referenced test results shall be as follows: 

1. Academic Alert Schools and School Systems. Schools and/or school systems 
scoring in Academic Alert that show a decrease of at least five in the percent of students scoring in 
Stanines 1, 2, and 3 will be considered to have made sufficient yearly progress or improvement as 
required by Act 95-3 13. School systems in Academic Alert by virtue of a majority of schools 
scoring in Academic Alert will follow Rules 290-4-1 -.0 1 (2)(e)( 1 ) and (f) (1-4). 

2. Academic Caution Schools and School Systems. Schools and/or school systems 
scoring in Academic Caution that show a decrease of at least two in the percent 
of students scoring in Stanines 1, 2, 3, and 4 will be considered to have made 
sufficient yearly progress or improvement. Schools and/or school systems 
achieving this standard remain in Academic Caution unless the percent of 
students scoring in Stanines 1, 2, 3, and 4 moves them to Academic Clear. 

Schools and/or school systems failing to make sufficient progress will be placed 
in Academic Alert Phase 1 . School systems in Academic Caution by virtue of a 
majority of schools scoring in Academic Alert or Academic Caution will follow 
Rules 290-4-1 -.01 (2)(e)(2) and (f) (1-4). 
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